Angryk, “Cluster analysis for optimal indexing

Tim Wylie
Michael A. Schuh
Rafal A. Angryk

Publication date

January 2013

Abstract

High-dimensional indexing is an important area of current re-search, especially for range and kNN queries. This work in-troduces clustering for the sake of indexing. The goal is to de-velop new clustering methods designed to optimize the data partitioning for an indexing-specific tree structure instead of finding data distribution-based clusters. We focus on iDis-tance, a state-of-the-art high-dimensional indexing method, and take a basic approach to solving this new problem. By uti-lizing spherical clusters in an unsupervised Expectation Max-imization algorithm dependent upon local density and cluster overlap, we create a partitioning of the space providing bal-anced segmentation for a B+-tree. We also look at the novel idea of reclusterin...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Angryk, “Cluster analysis for optimal indexing

Abstract

Extracted data

Angryk, “Cluster analysis for optimal indexing

Abstract

Extracted data

Related items

Related items